Drug-target interaction prediction using ensemble learning and dimensionality reduction.

نویسندگان

  • Ali Ezzat
  • Min Wu
  • Xiao-Li Li
  • Chee-Keong Kwoh
چکیده

Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Supervised prediction of drug-target interactions by ensemble learning

Drug-target interaction (DTI) provides novel insights about the genomic drug discovery. The wet experiments of identifying DTIs are time-consuming and costly. Recently, the increase of available data provides the opportunity to the development of computational methods. Although many computational methods have been proposed (such as classification-based methods, graph-based methods and network-b...

متن کامل

Improved Audio Classification Using a Novel Non-Linear Dimensionality Reduction Ensemble Approach

Two important categories of machine learning methodologies have recently attracted much interest in classification research and its applications. On one side, unsupervised and semi-supervised learning allow to benefit from the availability of larger sets of training data, even if not fully annotated with class labels, and of larger sets of diverse feature representations, through novel dimensio...

متن کامل

Prediction of Compound- Protein Interactions with Machine Learning Methods

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compoundprotein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Methods

دوره 129  شماره 

صفحات  -

تاریخ انتشار 2017